{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Highlights of the Code\n",
    "\n",
    "1. **Replication of Table A**  \n",
    "   This code replicates Table A from Internet Appendix C of the paper.\n",
    "\n",
    "<br>\n",
    "\n",
    "2. **Validating the Prediction Bands for Lasso-based Risk Premium Forecasts**  \n",
    "   Using Monte Carlo simulations, the code demonstrates that the paper's confidence intervals for\n",
    "   NN-based risk premium have accurate coverages. \n",
    "\n",
    "<br>\n",
    "\n",
    "3. **Validating the Prediction Bands for Neural-Network-based Risk Premium Forecasts**  \n",
    "   In addition, simulations confirm that the paper's prediction bands for the raw excess returns\n",
    "   are also correctly calibrated. This result validates the paper's prior choices and the \n",
    "   resultant posteriors for the factor loading coefficients. \n",
    "\n",
    "\n",
    "4. **Copyright Protection**  \n",
    "   The code is intended solely for research and non-commercial purposes.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "import numpy as np # linear algebra\n",
    "import multiprocessing as mp\n",
    "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n",
    "import seaborn as sns\n",
    "import numpy\n",
    "import pandas\n",
    "from joblib import Parallel, delayed\n",
    "import keras  \n",
    "from keras import regularizers\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras.wrappers.scikit_learn import KerasRegressor\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import Pipeline\n",
    "from keras.layers import BatchNormalization\n",
    "from keras.layers import Activation\n",
    "from keras import optimizers\n",
    "from sklearn.metrics import r2_score\n",
    "from keras import callbacks\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "from keras.layers import Dropout\n",
    "from keras.models import load_model\n",
    "from sklearn.metrics import mean_squared_error\n",
    "from sklearn.linear_model import ElasticNet\n",
    "from numpy.linalg import inv\n",
    "from scipy import stats\n",
    "import scipy\n",
    "from scipy import stats\n",
    "from scipy.optimize import minimize\n",
    "from scipy.stats import invgauss\n",
    "from sklearn import linear_model\n",
    "#from scipy.stats import invgauss\n",
    "import scipy.stats as sc\n",
    "from numpy.linalg import matrix_rank\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from portsort import portsort as ps\n",
    "#import gc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np # linear algebra\n",
    "import multiprocessing as mp\n",
    "import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n",
    "import seaborn as sns\n",
    "import numpy\n",
    "import pandas\n",
    "from joblib import Parallel, delayed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import keras  \n",
    "from keras import regularizers\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense\n",
    "from keras.wrappers.scikit_learn import KerasRegressor\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import Pipeline\n",
    "from keras.layers import BatchNormalization\n",
    "from keras.layers import Activation\n",
    "from keras import optimizers\n",
    "from sklearn.metrics import r2_score\n",
    "from keras import callbacks\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "from keras.layers import Dropout\n",
    "from keras.models import load_model\n",
    "from sklearn.metrics import mean_squared_error"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The simulated data perfectly mimic the simulations of Gu, Kelly, and Xiu (2020) (GKX, henceforth), except for the specification on the model residuals, which I model using a factor model. I thank GKX for making their Matlab codes publicly available, which I adapt (and attached separately) to obtain the simulated data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rsq_oos(y,ypred): \n",
    "    SSR = np.sum((y-ypred)**2)\n",
    "    MSS = np.sum((y)**2)\n",
    "    r_squared =1-SSR/MSS \n",
    "    return r_squared"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def Gibbs_sampler_lambda_wls(X, Y_tilde, xerror, errorbetain,h1, h2, n, r, delta):\n",
    "    \n",
    "    beta=np.zeros((n,X.shape[1]))\n",
    "    errorbeta=np.zeros((n,xerror.shape[1]))\n",
    "    tau_sq=np.zeros((n,X.shape[1]))\n",
    "    sigma_sq=np.zeros((n,))\n",
    "    lambda_sq=np.zeros((n,))\n",
    " \n",
    "    # Initialization\n",
    "    errorbeta[0]=errorbetain\n",
    "    beta[0] = np.random.uniform(size = X.shape[1])\n",
    "    sigma_sq[0] = np.random.uniform()\n",
    "    tau_sq[0] = np.random.uniform(size = X.shape[1])\n",
    "    lambda_sq[0] = np.random.uniform()\n",
    " \n",
    "      \n",
    "    Aerror= xerror.transpose().dot(xerror)\n",
    "    Aerrorinv=np.linalg.inv(Aerror)\n",
    "    \n",
    "    for i in range(n-1):\n",
    "        # Full conditional for beta\n",
    "        Y_error=Y_tilde-xerror.dot(errorbeta[i])\n",
    "        D_tau = np.diag(tau_sq[i])\n",
    "        D_tau_inv=np.linalg.inv(D_tau)\n",
    "        A = X.transpose().dot(X) + D_tau_inv\n",
    "        Ainv=np.linalg.inv(A)\n",
    "        multi_norm_mean = Ainv.dot(X.transpose()).dot(Y_error)\n",
    "        multi_norm_cov = sigma_sq[i] * Ainv\n",
    "        beta[i+1]=(np.random.multivariate_normal(multi_norm_mean, multi_norm_cov))\n",
    "        # Full conditional for sigma_sq\n",
    "        shape = (X.shape[0]-1+X.shape[1])/2\n",
    "        scale = ((Y_error - X.dot(beta[i+1])).dot((Y_error - X.dot(beta[i+1]))) + beta[i+1].transpose().dot(D_tau_inv).dot(beta[i+1]))/2\n",
    "        sigma_sq[i+1]=(sc.invgamma.rvs(a = shape, scale = scale))\n",
    "        # Full conditional for tau_1,...,tau_p\n",
    "        mean = np.sqrt(lambda_sq[i]*sigma_sq[i+1]/beta[i+1]**2)\n",
    "        scale = np.repeat(lambda_sq[i], X.shape[1])\n",
    "        tau_sq[i+1]=(1/np.random.wald(mean, scale))\n",
    "        # Full conditional for lambda_sq\n",
    "        shape = X.shape[1] + r\n",
    "        rate = sum(tau_sq[i+1])/2+delta\n",
    "        lambda_sq[i+1]=(np.random.gamma(shape, 1/rate))\n",
    "        \n",
    "        errortrain=Y_tilde-X.dot(beta[i+1])\n",
    "        xyerror=xerror.transpose().dot(errortrain)\n",
    "        errorbeta_mean=Aerrorinv.dot(xyerror)\n",
    "        \n",
    "        errorvar=np.var(Y_tilde-X.dot(beta[i+1])-xerror.dot(errorbeta[i]))\n",
    "        errorbeta[i+1]=np.random.multivariate_normal(h1*errorbeta_mean, h2*Aerrorinv*errorvar)\n",
    "    return beta[int(n/2):],errorbeta[int(n/2):]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "import tensorflow as tf\n",
    "seed=0\n",
    "\n",
    "random.seed(seed)\n",
    "np.random.seed(seed)\n",
    "tf.random.set_seed(seed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def confidence_levels():\n",
    "    \n",
    "    \n",
    "    risk_premium_coverages=np.zeros((100,10))\n",
    "    raw_return_coverages=np.zeros((100,10))\n",
    "    prob_95 = np.zeros(100)\n",
    "    prob_90 = np.zeros(100)\n",
    "    prob_80 = np.zeros(100)\n",
    "    prob_70=np.zeros(100)\n",
    "    prob_60=np.zeros(100)\n",
    "    prob_50=np.zeros(100)\n",
    "    prob_40=np.zeros(100)\n",
    "    prob_30=np.zeros(100)\n",
    "    prob_20=np.zeros(100)\n",
    "    prob_10=np.zeros(100)\n",
    "    prob_raw_95 = np.zeros(100)\n",
    "    prob_raw_90 = np.zeros(100)\n",
    "    prob_raw_80 = np.zeros(100)\n",
    "    prob_raw_70=np.zeros(100)\n",
    "    prob_raw_60=np.zeros(100)\n",
    "    prob_raw_50=np.zeros(100)\n",
    "    prob_raw_40=np.zeros(100)\n",
    "    prob_raw_30=np.zeros(100)\n",
    "    prob_raw_20=np.zeros(100)\n",
    "    prob_raw_10=np.zeros(100)\n",
    "\n",
    "\n",
    "    \n",
    "    for M in range(1,101):\n",
    "            \n",
    "        c=pd.read_csv('D:/bayesian jmp/Simu/SimuData_p100/c{}.csv'.format(M),header=None)\n",
    "        r1=pd.read_csv('D:/bayesian jmp/Simu/SimuData_p100/r1_{}.csv'.format(M),header=None)\n",
    "        er=pd.read_csv('D:/bayesian jmp/Simu/SimuData_p100/er1_{}.csv'.format(M),header=None)\n",
    "        f1=pd.read_csv('D:/bayesian jmp/Simu/SimuData_p100/f1_{}.csv'.format(M),header=None)\n",
    "\n",
    "        \n",
    "        \n",
    "        er=er.iloc[24000:36000]\n",
    "        er=er.reset_index().drop(columns='index')\n",
    "        er.columns=['TrueSimulatedER']\n",
    "\n",
    "\n",
    "\n",
    "        xtrain=c.iloc[0:12000,:] ##TRaining data\n",
    "        xtest=c.iloc[12000:24000,:]\n",
    "        xoos=c.iloc[24000:36000,:]\n",
    "\n",
    "\n",
    "        ytrain=r1.iloc[0:12000:,0]\n",
    "        ytest=r1.iloc[12000:24000,0]\n",
    "        yoos=r1.iloc[24000:36000,0]\n",
    "        \n",
    "        \n",
    "        xerror=f1.iloc[0:12000,:]\n",
    "        xterror=f1.iloc[12000:24000,:]\n",
    "        xooserror=(c.iloc[24000:36000,0:2]).reset_index().drop(columns=['index'])\n",
    "        \n",
    "        \n",
    "     \n",
    "        ytraind=(ytrain-np.mean(ytrain,axis=0))/np.std(ytrain)\n",
    "        \n",
    "\n",
    "        xtraind=(xtrain-np.mean(xtrain,axis=0))/np.std(xtrain)\n",
    "        xtestd=(xtest-np.mean(xtrain,axis=0))/np.std(xtrain)\n",
    "        xoosd=(xoos-np.mean(xtrain,axis=0))/np.std(xtrain)\n",
    "        errorbetain=[0,0]\n",
    "        factor_std=np.std(xerror.iloc[:,0]/(c.iloc[0:12000,0]))\n",
    "\n",
    "        beta_gibbs,errorbeta_gibbs=Gibbs_sampler_lambda_wls(xtraind, ytraind, xerror, errorbetain, 1, 1, 100, 0.15, 0.2)\n",
    "\n",
    "        beta_bayes=np.median(beta_gibbs,axis=0)\n",
    "        \n",
    "        yoosbayes=(xoosd.dot(beta_bayes)*np.std(ytrain)+np.mean(ytrain)).reset_index().drop(columns=\"index\")\n",
    "        yoosbayes.rename(columns={0:'yoosbayes'},inplace=True)\n",
    "        returns=yoos.reset_index().drop(columns='index')\n",
    "        returns.columns=['returns']\n",
    "\n",
    "\n",
    "        yoos_gibbs=np.zeros((50,yoos.shape[0]))\n",
    "\n",
    "        for i in range(50):\n",
    "            yoos_gibbs[i]=xoosd.dot(beta_gibbs[i])*np.std(ytrain)+np.mean(ytrain)\n",
    "        \n",
    "        yoosconf=np.std(yoos_gibbs,axis=0) ##Forecast standard error\n",
    "        yoosconf=pd.DataFrame(yoosconf, columns = ['yoosconf'])\n",
    "        \n",
    "        ytest_bayes=(xtestd.dot(beta_bayes)*np.std(ytrain)+np.mean(ytrain))\n",
    "\n",
    "##Computation of forecast variance attributable to model residuals\n",
    "        \n",
    "        errorstd=np.std(ytest-np.std(ytest)*np.dot(xterror,(np.mean(errorbeta_gibbs,axis=0).T))-ytest_bayes)\n",
    "        ##Estimating the error variance (Var(epsilon))\n",
    "        errorbeta_bayes=np.median(errorbeta_gibbs,axis=0)\n",
    "        yooserrorstd=np.sqrt((np.std(ytrain)*factor_std*xooserror.dot(errorbeta_bayes))**2+errorstd**2) \n",
    "        ##Error variance+variance due to common factor exposure                                        \n",
    "        yooserrorstd=pd.DataFrame(yooserrorstd, columns = ['yooserrorstd'])\n",
    "        \n",
    "    \n",
    "        Time=pd.DataFrame(np.repeat(range(60),200),columns=['Time'])\n",
    "        stockid=list(range(200))\n",
    "        Stockid=pd.DataFrame(stockid*60,columns=['StockID'])\n",
    "        final_data=pd.concat([Time,returns,er, yoosbayes,yoosconf,yooserrorstd,Stockid], axis=1)\n",
    "        final_data['tstat']=np.abs(final_data['TrueSimulatedER']-final_data['yoosbayes'])/final_data['yoosconf']\n",
    "        final_data['totalerrorstd']=(final_data['yooserrorstd']**2+final_data['yoosconf']**2)**0.5\n",
    "        \n",
    "        final_data['rawtstat']=np.abs(final_data['returns']-final_data['yoosbayes'])/final_data['totalerrorstd']\n",
    "\n",
    "\n",
    "        prob_95[M-1]=(final_data.loc[final_data['tstat'] <=1.96].shape[0]/12000) # 95\n",
    "        prob_90[M-1]=(final_data.loc[final_data['tstat'] <=1.65].shape[0]/12000) # 90\n",
    "        prob_80[M-1]=(final_data.loc[final_data['tstat'] <=1.282].shape[0]/12000) # 80\n",
    "        prob_70[M-1]=(final_data.loc[final_data['tstat'] <=1.036].shape[0]/12000) # 70\n",
    "        prob_60[M-1]=(final_data.loc[final_data['tstat'] <=0.842].shape[0]/12000) # 60\n",
    "\n",
    "        prob_50[M-1]=(final_data.loc[final_data['tstat'] <=0.674].shape[0]/12000) # 50\n",
    "        prob_40[M-1]=(final_data.loc[final_data['tstat'] <=0.524].shape[0]/12000) # 40\n",
    "        prob_30[M-1]=(final_data.loc[final_data['tstat'] <=0.385].shape[0]/12000) # 30\n",
    "\n",
    "        prob_20[M-1]=(final_data.loc[final_data['tstat'] <=0.253].shape[0]/12000) # 20\n",
    "        prob_10[M-1]=(final_data.loc[final_data['tstat'] <=0.126].shape[0]/12000) # 10\n",
    "        \n",
    "        \n",
    "        \n",
    "        \n",
    "                \n",
    "        prob_raw_95[M-1]=(final_data.loc[final_data['rawtstat'] <=1.96].shape[0]/12000) # 95-percentile based on t(5) dist\n",
    "        prob_raw_90[M-1]=(final_data.loc[final_data['rawtstat'] <=1.65].shape[0]/12000) # 90\n",
    "        prob_raw_80[M-1]=(final_data.loc[final_data['rawtstat'] <=1.282].shape[0]/12000) # 80\n",
    "        prob_raw_70[M-1]=(final_data.loc[final_data['rawtstat'] <=1.036].shape[0]/12000) # 70\n",
    "        prob_raw_60[M-1]=(final_data.loc[final_data['rawtstat'] <=0.842].shape[0]/12000) # 60\n",
    "\n",
    "        prob_raw_50[M-1]=(final_data.loc[final_data['rawtstat'] <=0.674].shape[0]/12000) # 50\n",
    "        prob_raw_40[M-1]=(final_data.loc[final_data['rawtstat'] <=0.524].shape[0]/12000) # 40\n",
    "        prob_raw_30[M-1]=(final_data.loc[final_data['rawtstat'] <=0.385].shape[0]/12000) # 30\n",
    "\n",
    "        prob_raw_20[M-1]=(final_data.loc[final_data['rawtstat'] <=0.253].shape[0]/12000) # 20\n",
    "        prob_raw_10[M-1]=(final_data.loc[final_data['rawtstat'] <=0.126].shape[0]/12000) # 10\n",
    "        \n",
    "        risk_premium_coverages[M-1,:]=[prob_95[M-1], prob_90[M-1], prob_80[M-1],prob_70[M-1],prob_60[M-1],prob_50[M-1],prob_40[M-1],prob_30[M-1],prob_20[M-1],prob_10[M-1]]\n",
    "        raw_return_coverages[M-1,:]=[prob_raw_95[M-1], prob_raw_90[M-1], prob_raw_80[M-1],prob_raw_70[M-1],prob_raw_60[M-1],prob_raw_50[M-1],prob_raw_40[M-1],prob_raw_30[M-1],prob_raw_20[M-1],prob_raw_10[M-1]]\n",
    "\n",
    "    \n",
    "      \n",
    "    return risk_premium_coverages,raw_return_coverages\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "risk_premium_coverages,raw_return_coverages=confidence_levels()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.95748667, 0.91454417, 0.822045  , 0.725505  , 0.626035  ,\n",
       "       0.52392583, 0.42140833, 0.31738583, 0.2117075 , 0.10642417])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(risk_premium_coverages[range(100)],axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.950825  , 0.91334667, 0.8331225 , 0.7463125 , 0.65306667,\n",
       "       0.55279083, 0.44830667, 0.3400775 , 0.22884083, 0.1157525 ])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(raw_return_coverages[range(100)],axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
